Below we display our sessionInfo().
sessionInfo(package=NULL)
## R version 3.3.2 (2016-10-31)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: OS X El Capitan 10.11.6
##
## locale:
## [1] C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## loaded via a namespace (and not attached):
## [1] backports_1.0.5 magrittr_1.5 rprojroot_1.2 tools_3.3.2
## [5] htmltools_0.3.5 yaml_2.1.14 Rcpp_0.12.10 stringi_1.1.2
## [9] rmarkdown_1.3 knitr_1.15.1 stringr_1.1.0 digest_0.6.11
## [13] evaluate_0.10
For both datasets we run the relevant ETL operations. We clean the data by first removing special characters (e.g. - ~) from the column names. We then decide which columns are measures and which are dimensions.
For dimensions, we change NA to an empty string, change “&” to “and”, change “:” to “;”. We get rid of " and ’.
For measures, we change NA to 0. We get rid of all characters except for numbers and the - sign.
## [1] "Population"
## [1] "Employment"
## [1] "Unemployment"
## [1] "Unemployment.rate"
## [1] "Marginally.Food.Insecure"
## [1] "Food.Insecure"
## [1] "Very.Low.Food.Secure"
## [1] "Gross.State.Product"
## [1] "Number.of.low.income.uninsured.children"
## [1] "Percent.Low.Income.Unisured.Children"
## [1] "Personal.income"
## [1] "Workers..compensation"
## CREATE TABLE 01_Dataclean_employment (
## -- Change table_name to the table name you want.
## state_name varchar2(4000),
## state varchar2(4000),
## year varchar2(4000),
## Population number(38,4),
## Employment number(38,4),
## Unemployment number(38,4),
## Unemployment.rate number(38,4),
## Marginally.Food.Insecure number(38,4),
## Food.Insecure number(38,4),
## Very.Low.Food.Secure number(38,4),
## Gross.State.Product number(38,4),
## Number.of.low.income.uninsured.children number(38,4),
## Percent.Low.Income.Unisured.Children number(38,4),
## Personal.income number(38,4),
## Workers..compensation number(38,4)
## );
## state_name state year Population
## Alabama : 5 1 : 5 2010:51 Min. : 564516
## Alaska : 5 10 : 5 2011:51 1st Qu.: 1623796
## Arizona : 5 11 : 5 2012:51 Median : 4382667
## Arkansas : 5 12 : 5 2013:51 Mean : 6158836
## California: 5 13 : 5 2014:51 3rd Qu.: 6789176
## Colorado : 5 14 : 5 Max. :38792291
## (Other) :225 (Other):225
## Employment Unemployment Unemployment.rate
## Min. : 283744 Min. : 11152 Min. : 2.700
## 1st Qu.: 730472 1st Qu.: 54283 1st Qu.: 6.000
## Median : 1877812 Median : 157581 Median : 7.300
## Mean : 2796855 Mean : 244360 Mean : 7.368
## 3rd Qu.: 3235551 3rd Qu.: 288906 3rd Qu.: 8.650
## Max. :17348645 Max. :2244326 Max. :13.500
##
## Marginally.Food.Insecure Food.Insecure Very.Low.Food.Secure
## Min. :11.71 Min. : 7.883 Min. :2.036
## 1st Qu.:22.16 1st Qu.:13.051 1st Qu.:4.402
## Median :25.47 Median :15.394 Median :5.378
## Mean :25.45 Mean :15.362 Mean :5.368
## 3rd Qu.:28.50 3rd Qu.:17.266 3rd Qu.:6.159
## Max. :41.08 Max. :25.224 Max. :9.197
##
## Gross.State.Product Number.of.low.income.uninsured.children
## Min. : 26570 Min. : 1.00
## 1st Qu.: 76363 1st Qu.: 14.00
## Median : 190304 Median : 44.00
## Mean : 315502 Mean : 83.01
## 3rd Qu.: 404486 3rd Qu.: 85.00
## Max. :2311616 Max. :843.00
##
## Percent.Low.Income.Unisured.Children Personal.income
## Min. : 0.700 Min. :2.561e+07
## 1st Qu.: 3.000 1st Qu.:6.498e+07
## Median : 4.100 Median :1.664e+08
## Mean : 4.756 Mean :2.685e+08
## 3rd Qu.: 6.100 3rd Qu.:3.427e+08
## Max. :15.000 Max. :1.978e+09
##
## Workers..compensation
## Min. : 7907
## 1st Qu.: 36316
## Median : 110973
## Mean : 298914
## 3rd Qu.: 246610
## Max. :2443512
##
## state_name state year Population Employment Unemployment
## 5 California 5 2010 37334079 16091945 2244326
## 56 California 5 2011 37700034 16258133 2156967
## 107 California 5 2012 38056055 16602672 1921121
## 158 California 5 2013 38414128 16958735 1665590
## 209 California 5 2014 38792291 17348645 1406380
## Unemployment.rate Marginally.Food.Insecure Food.Insecure
## 5 12.2 29.61473 18.56144
## 56 11.7 31.61105 19.07340
## 107 10.4 27.89361 16.48582
## 158 8.9 25.53890 15.72165
## 209 7.5 24.16489 13.69295
## Very.Low.Food.Secure Gross.State.Product
## 5 5.71067 1953411
## 56 6.10045 2030468
## 107 5.79252 2125717
## 158 5.06253 2202678
## 209 4.15612 2311616
## Number.of.low.income.uninsured.children
## 5 763
## 56 770
## 107 653
## 158 488
## 209 341
## Percent.Low.Income.Unisured.Children Personal.income
## 5 7.8 1617134250
## 56 7.8 1727433579
## 107 6.7 1838567162
## 158 5.0 1861956514
## 209 4.0 1977923740
## Workers..compensation
## 5 2067143
## 56 2062255
## 107 2042670
## 158 1990609
## 209 2072792
## [1] "Population"
## [1] "Violent.crime.total"
## [1] "Murder.and.nonnegligent.Manslaughter"
## [1] "Legacy.rape..1"
## [1] "Revised.rape..2"
## [1] "Robbery"
## [1] "Aggravated.assault"
## [1] "Property.crime.total"
## [1] "Burglary"
## [1] "Larceny.theft"
## [1] "Motor.vehicle.theft"
## [1] "Violent.Crime.rate"
## [1] "Murder.and.nonnegligent.manslaughter.rate"
## [1] "Legacy.rape.rate..1"
## [1] "Revised.rape.rate..2"
## [1] "Robbery.rate"
## [1] "Aggravated.assault.rate"
## [1] "Property.crime.rate"
## [1] "Burglary.rate"
## [1] "Larceny.theft.rate"
## [1] "Motor.vehicle.theft.rate"
## CREATE TABLE 01_Dataclean_crime (
## -- Change table_name to the table name you want.
## State varchar2(4000),
## Year varchar2(4000),
## Population number(38,4),
## Violent.crime.total number(38,4),
## Murder.and.nonnegligent.Manslaughter number(38,4),
## Legacy.rape..1 number(38,4),
## Revised.rape..2 number(38,4),
## Robbery number(38,4),
## Aggravated.assault number(38,4),
## Property.crime.total number(38,4),
## Burglary number(38,4),
## Larceny.theft number(38,4),
## Motor.vehicle.theft number(38,4),
## Violent.Crime.rate number(38,4),
## Murder.and.nonnegligent.manslaughter.rate number(38,4),
## Legacy.rape.rate..1 number(38,4),
## Revised.rape.rate..2 number(38,4),
## Robbery.rate number(38,4),
## Aggravated.assault.rate number(38,4),
## Property.crime.rate number(38,4),
## Burglary.rate number(38,4),
## Larceny.theft.rate number(38,4),
## Motor.vehicle.theft.rate number(38,4)
## );
This is a barchart with Employment on the Columns and State, filtered by year, on the rows. The black line is the state aggregated employment average, so the bars that pass the black line show that during that year, the employment number passed the employment state average for all the years. The red and blue color coding represents whether the employment number passed the national aggregated employment average. The difference between a certain year’s employment in that state and the national average is shown as the numbers next to the bars.
The graph is a ID set, the graph shows the gross state products among all 50 states. There are two sets, one is high gross state product, and another one medium gross state product. The darker the blue dot means a higher gross state product percentage. One can see from the graph that northeastern states have higher gross state product percentages.
This is a map of all the states with their worker’s compensation vs income. The darker the color of the state means that the employees got more worker’s compensation for the amount they made. It’s interesting to note that working on the west region of the United States gives the best worker’s compensation. Working in the central region of the United States results in poor worker’s compensation for the income rate.
The above shiny visulization shows the same results as the Tableau version of the barchart.
The above shiny visulization shows the same results as the Tableau version of the ID set. It colors each state based on high or low GSP.
The above shiny visulization shows the same results as the Tableau version of the map.
Here is the website to the shiny application : ???